Fast Outlier Detection Using a Grid-Based Algorithm
نویسندگان
چکیده
As one of data mining techniques, outlier detection aims to discover outlying observations that deviate substantially from the reminder of the data. Recently, the Local Outlier Factor (LOF) algorithm has been successfully applied to outlier detection. However, due to the computational complexity of the LOF algorithm, its application to large data with high dimension has been limited. The aim of this paper is to propose grid-based algorithm that reduces the computation time required by the LOF algorithm to determine the k-nearest neighbors. The algorithm divides the data spaces in to a smaller number of regions, called as a "grid", and calculates the LOF value of each grid. To examine the effectiveness of the proposed method, several experiments incorporating different parameters were conducted. The proposed method demonstrated a significant computation time reduction with predictable and acceptable trade-off errors. Then, the proposed methodology was successfully applied to real database transaction logs of Korea Atomic Energy Research Institute. As a result, we show that for a very large dataset, the grid-LOF can be considered as an acceptable approximation for the original LOF. Moreover, it can also be effectively used for real-time outlier detection.
منابع مشابه
Fast Top-k Distance-Based Outlier Detection on Uncertain Data
This paper studies the problem of top-k distance-based outlier detection on uncertain data. In this work, an uncertain object is modelled by a probability density function of a Gaussian distribution. We start with the Naive approach. We then introduce a populated-cell list (PC-list), a sorted list of non-empty cells of a grid (grid is used to index our data). Using PC-list, our top-k outlier de...
متن کاملVery Fast Load Flow Calculation Using Fast-Decoupled Reactive Power Compensation Method for Radial Active Distribution Networks in Smart Grid Environment Based on Zooming Algorithm
Distribution load flow (DLF) calculation is one of the most important tools in distribution networks. DLF tools must be able to perform fast calculations in real-time studies at the presence of distributed generators (DGs) in a smart grid environment even in conditions of change in the network topology. In this paper, a new method for DLF in radial active distribution networks is proposed. The ...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملOutlier Detection for Pedestrian Movement
Fast development of tracking devices has made trajectory outlier detection (TOD) possible and meaningful. Given a set of trajectories T , a TOD algorithm outputs a subset of T , of which trajectories are different from most of the other trajectories in some aspect(s). These trajectories, namely outliers, can indicate important or interesting information and are thus worth noticing. TOD techniqu...
متن کاملOnline Bivariate Outlier Detection in Final Test Using Kernel Density Estimation
In parametric IC testing, outlier detection is applied to filter out potential unreliable devices. Most outlier detection methods are used in an offline setting and hence are not applicable to Final Test, where immediate pass/fail decisions are required. Therefore, we developed a new bivariate online outlier detection method that is applicable to Final Test without making assumptions about a sp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 11 شماره
صفحات -
تاریخ انتشار 2016